Overview

Dataset statistics

Number of variables20
Number of observations2666
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory398.5 KiB
Average record size in memory153.0 B

Variable types

NUM15
BOOL3
CAT2

Warnings

State has a high cardinality: 51 distinct values High cardinality
Total day charge is highly correlated with Total day minutesHigh correlation
Total day minutes is highly correlated with Total day chargeHigh correlation
Total eve charge is highly correlated with Total eve minutesHigh correlation
Total eve minutes is highly correlated with Total eve chargeHigh correlation
Total night charge is highly correlated with Total night minutesHigh correlation
Total night minutes is highly correlated with Total night chargeHigh correlation
Total intl charge is highly correlated with Total intl minutesHigh correlation
Total intl minutes is highly correlated with Total intl chargeHigh correlation
Number vmail messages has 1933 (72.5%) zeros Zeros
Customer service calls has 555 (20.8%) zeros Zeros

Reproduction

Analysis started2021-01-31 06:26:24.342885
Analysis finished2021-01-31 06:27:45.520150
Duration1 minute and 21.18 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

State
Categorical

HIGH CARDINALITY

Distinct51
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size20.8 KiB
WV
 
88
MN
 
70
NY
 
68
VA
 
67
WY
 
66
Other values (46)
2307 
ValueCountFrequency (%) 
WV883.3%
 
MN702.6%
 
NY682.6%
 
VA672.5%
 
WY662.5%
 
OH662.5%
 
AL662.5%
 
OR622.3%
 
WI612.3%
 
NV612.3%
 
Other values (41)199174.7%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length2
Median length2
Mean length2
Min length2

Account length
Real number (ℝ≥0)

Distinct205
Distinct (%)7.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean100.6204051
Minimum1
Maximum243
Zeros0
Zeros (%)0.0%
Memory size20.8 KiB

Quantile statistics

Minimum1
5-th percentile36
Q173
median100
Q3127
95-th percentile166
Maximum243
Range242
Interquartile range (IQR)54

Descriptive statistics

Standard deviation39.56397365
Coefficient of variation (CV)0.3932003018
Kurtosis-0.1383128669
Mean100.6204051
Median Absolute Deviation (MAD)27
Skewness0.07902340636
Sum268254
Variance1565.308011
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
93351.3%
 
87331.2%
 
105331.2%
 
101321.2%
 
99321.2%
 
100311.2%
 
116291.1%
 
106291.1%
 
98291.1%
 
90291.1%
 
Other values (195)235488.3%
 
ValueCountFrequency (%) 
160.2%
 
21< 0.1%
 
340.2%
 
41< 0.1%
 
51< 0.1%
 
ValueCountFrequency (%) 
2431< 0.1%
 
22520.1%
 
22420.1%
 
2211< 0.1%
 
2171< 0.1%
 

Area code
Categorical

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size20.8 KiB
415
1318 
510
679 
408
669 
ValueCountFrequency (%) 
415131849.4%
 
51067925.5%
 
40866925.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length3
Median length3
Mean length3
Min length3
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size20.8 KiB
No
2396 
Yes
270 
ValueCountFrequency (%) 
No239689.9%
 
Yes27010.1%
 
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size20.8 KiB
No
1933 
Yes
733 
ValueCountFrequency (%) 
No193372.5%
 
Yes73327.5%
 

Number vmail messages
Real number (ℝ≥0)

ZEROS

Distinct42
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.021755439
Minimum0
Maximum50
Zeros1933
Zeros (%)72.5%
Memory size20.8 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q319
95-th percentile36
Maximum50
Range50
Interquartile range (IQR)19

Descriptive statistics

Standard deviation13.61227702
Coefficient of variation (CV)1.696919972
Kurtosis-0.04015788882
Mean8.021755439
Median Absolute Deviation (MAD)0
Skewness1.271773633
Sum21386
Variance185.2940856
MonotocityNot monotonic
Histogram with fixed size bins (bins=42)
ValueCountFrequency (%) 
0193372.5%
 
31501.9%
 
28421.6%
 
29391.5%
 
24371.4%
 
33371.4%
 
30351.3%
 
27341.3%
 
25331.2%
 
32331.2%
 
Other values (32)39314.7%
 
ValueCountFrequency (%) 
0193372.5%
 
41< 0.1%
 
820.1%
 
920.1%
 
101< 0.1%
 
ValueCountFrequency (%) 
5020.1%
 
4730.1%
 
4630.1%
 
4540.2%
 
4470.3%
 

Total day minutes
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1489
Distinct (%)55.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179.4816204
Minimum0
Maximum350.8
Zeros2
Zeros (%)0.1%
Memory size20.8 KiB

Quantile statistics

Minimum0
5-th percentile90.425
Q1143.4
median179.95
Q3215.9
95-th percentile269.775
Maximum350.8
Range350.8
Interquartile range (IQR)72.5

Descriptive statistics

Standard deviation54.21035022
Coefficient of variation (CV)0.3020384488
Kurtosis0.01936427966
Mean179.4816204
Median Absolute Deviation (MAD)36.25
Skewness-0.05310559809
Sum478498
Variance2938.762071
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
162.370.3%
 
183.470.3%
 
194.860.2%
 
175.460.2%
 
159.560.2%
 
18560.2%
 
21660.2%
 
14550.2%
 
124.350.2%
 
141.350.2%
 
Other values (1479)260797.8%
 
ValueCountFrequency (%) 
020.1%
 
2.61< 0.1%
 
7.81< 0.1%
 
7.91< 0.1%
 
12.51< 0.1%
 
ValueCountFrequency (%) 
350.81< 0.1%
 
346.81< 0.1%
 
345.31< 0.1%
 
337.41< 0.1%
 
335.51< 0.1%
 

Total day calls
Real number (ℝ≥0)

Distinct115
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean100.3102026
Minimum0
Maximum160
Zeros2
Zeros (%)0.1%
Memory size20.8 KiB

Quantile statistics

Minimum0
5-th percentile67
Q187
median101
Q3114
95-th percentile133
Maximum160
Range160
Interquartile range (IQR)27

Descriptive statistics

Standard deviation19.98816219
Coefficient of variation (CV)0.1992635014
Kurtosis0.2895491547
Mean100.3102026
Median Absolute Deviation (MAD)13
Skewness-0.1282668464
Sum267427
Variance399.5266276
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
105622.3%
 
106592.2%
 
108592.2%
 
112582.2%
 
107572.1%
 
102572.1%
 
100562.1%
 
104552.1%
 
95552.1%
 
88542.0%
 
Other values (105)209478.5%
 
ValueCountFrequency (%) 
020.1%
 
361< 0.1%
 
401< 0.1%
 
4220.1%
 
4430.1%
 
ValueCountFrequency (%) 
1601< 0.1%
 
15830.1%
 
1571< 0.1%
 
1561< 0.1%
 
1521< 0.1%
 

Total day charge
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1489
Distinct (%)55.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30.51240435
Minimum0
Maximum59.64
Zeros2
Zeros (%)0.1%
Memory size20.8 KiB

Quantile statistics

Minimum0
5-th percentile15.375
Q124.38
median30.59
Q336.7
95-th percentile45.865
Maximum59.64
Range59.64
Interquartile range (IQR)12.32

Descriptive statistics

Standard deviation9.215732907
Coefficient of variation (CV)0.3020323407
Kurtosis0.01950186757
Mean30.51240435
Median Absolute Deviation (MAD)6.16
Skewness-0.0530869042
Sum81346.07
Variance84.92973302
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
27.5970.3%
 
31.1870.3%
 
31.4560.2%
 
29.8260.2%
 
36.7260.2%
 
33.1260.2%
 
27.1260.2%
 
24.6550.2%
 
33.7350.2%
 
35.2950.2%
 
Other values (1479)260797.8%
 
ValueCountFrequency (%) 
020.1%
 
0.441< 0.1%
 
1.331< 0.1%
 
1.341< 0.1%
 
2.131< 0.1%
 
ValueCountFrequency (%) 
59.641< 0.1%
 
58.961< 0.1%
 
58.71< 0.1%
 
57.361< 0.1%
 
57.041< 0.1%
 

Total eve minutes
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1442
Distinct (%)54.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean200.386159
Minimum0
Maximum363.7
Zeros1
Zeros (%)< 0.1%
Memory size20.8 KiB

Quantile statistics

Minimum0
5-th percentile118.725
Q1165.3
median200.9
Q3235.1
95-th percentile285.025
Maximum363.7
Range363.7
Interquartile range (IQR)69.8

Descriptive statistics

Standard deviation50.95151512
Coefficient of variation (CV)0.2542666388
Kurtosis-0.02549313226
Mean200.386159
Median Absolute Deviation (MAD)35
Skewness-0.01266524296
Sum534229.5
Variance2596.056893
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
169.980.3%
 
220.670.3%
 
167.270.3%
 
161.770.3%
 
181.660.2%
 
195.560.2%
 
19460.2%
 
224.960.2%
 
205.160.2%
 
209.460.2%
 
Other values (1432)260197.6%
 
ValueCountFrequency (%) 
01< 0.1%
 
31.21< 0.1%
 
42.21< 0.1%
 
42.51< 0.1%
 
43.91< 0.1%
 
ValueCountFrequency (%) 
363.71< 0.1%
 
354.21< 0.1%
 
350.91< 0.1%
 
348.51< 0.1%
 
347.31< 0.1%
 

Total eve calls
Real number (ℝ≥0)

Distinct120
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean100.0236309
Minimum0
Maximum170
Zeros1
Zeros (%)< 0.1%
Memory size20.8 KiB

Quantile statistics

Minimum0
5-th percentile67
Q187
median100
Q3114
95-th percentile133
Maximum170
Range170
Interquartile range (IQR)27

Descriptive statistics

Standard deviation20.16144512
Coefficient of variation (CV)0.2015668191
Kurtosis0.1893960643
Mean100.0236309
Median Absolute Deviation (MAD)13.5
Skewness-0.06520928393
Sum266663
Variance406.4838691
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
105642.4%
 
94622.3%
 
109582.2%
 
102562.1%
 
108552.1%
 
87542.0%
 
97542.0%
 
115532.0%
 
111522.0%
 
98522.0%
 
Other values (110)210679.0%
 
ValueCountFrequency (%) 
01< 0.1%
 
121< 0.1%
 
361< 0.1%
 
421< 0.1%
 
431< 0.1%
 
ValueCountFrequency (%) 
1701< 0.1%
 
1591< 0.1%
 
1571< 0.1%
 
1561< 0.1%
 
15520.1%
 

Total eve charge
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1301
Distinct (%)48.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.03307202
Minimum0
Maximum30.91
Zeros1
Zeros (%)< 0.1%
Memory size20.8 KiB

Quantile statistics

Minimum0
5-th percentile10.0925
Q114.05
median17.08
Q319.98
95-th percentile24.225
Maximum30.91
Range30.91
Interquartile range (IQR)5.93

Descriptive statistics

Standard deviation4.330864177
Coefficient of variation (CV)0.2542620716
Kurtosis-0.0255701337
Mean17.03307202
Median Absolute Deviation (MAD)2.98
Skewness-0.01262903519
Sum45410.17
Variance18.75638452
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
16.1290.3%
 
14.2590.3%
 
14.4480.3%
 
18.9680.3%
 
18.6280.3%
 
17.4380.3%
 
17.9980.3%
 
18.7570.3%
 
16.6370.3%
 
16.9770.3%
 
Other values (1291)258797.0%
 
ValueCountFrequency (%) 
01< 0.1%
 
2.651< 0.1%
 
3.591< 0.1%
 
3.611< 0.1%
 
3.731< 0.1%
 
ValueCountFrequency (%) 
30.911< 0.1%
 
30.111< 0.1%
 
29.831< 0.1%
 
29.621< 0.1%
 
29.521< 0.1%
 

Total night minutes
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1444
Distinct (%)54.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201.1689422
Minimum43.7
Maximum395
Zeros0
Zeros (%)0.0%
Memory size20.8 KiB

Quantile statistics

Minimum43.7
5-th percentile117.925
Q1166.925
median201.15
Q3236.475
95-th percentile283.675
Maximum395
Range351.3
Interquartile range (IQR)69.55

Descriptive statistics

Standard deviation50.78032337
Coefficient of variation (CV)0.2524262583
Kurtosis0.05038227445
Mean201.1689422
Median Absolute Deviation (MAD)34.8
Skewness0.02336249992
Sum536316.4
Variance2578.641241
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
214.770.3%
 
172.760.2%
 
181.260.2%
 
214.660.2%
 
193.660.2%
 
182.160.2%
 
21060.2%
 
21460.2%
 
197.460.2%
 
194.360.2%
 
Other values (1434)260597.7%
 
ValueCountFrequency (%) 
43.71< 0.1%
 
451< 0.1%
 
47.41< 0.1%
 
50.120.1%
 
53.31< 0.1%
 
ValueCountFrequency (%) 
3951< 0.1%
 
381.91< 0.1%
 
377.51< 0.1%
 
364.91< 0.1%
 
364.31< 0.1%
 

Total night calls
Real number (ℝ≥0)

Distinct118
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean100.1061515
Minimum33
Maximum166
Zeros0
Zeros (%)0.0%
Memory size20.8 KiB

Quantile statistics

Minimum33
5-th percentile68
Q187
median100
Q3113
95-th percentile131
Maximum166
Range133
Interquartile range (IQR)26

Descriptive statistics

Standard deviation19.41845855
Coefficient of variation (CV)0.1939786742
Kurtosis-0.04800868253
Mean100.1061515
Median Absolute Deviation (MAD)13
Skewness0.01041040145
Sum266883
Variance377.0765325
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
105702.6%
 
104672.5%
 
91602.3%
 
102582.2%
 
106582.2%
 
100572.1%
 
96542.0%
 
95532.0%
 
108532.0%
 
98532.0%
 
Other values (108)208378.1%
 
ValueCountFrequency (%) 
331< 0.1%
 
361< 0.1%
 
381< 0.1%
 
421< 0.1%
 
441< 0.1%
 
ValueCountFrequency (%) 
1661< 0.1%
 
1641< 0.1%
 
1581< 0.1%
 
15720.1%
 
15620.1%
 

Total night charge
Real number (ℝ≥0)

HIGH CORRELATION

Distinct885
Distinct (%)33.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.052689422
Minimum1.97
Maximum17.77
Zeros0
Zeros (%)0.0%
Memory size20.8 KiB

Quantile statistics

Minimum1.97
5-th percentile5.31
Q17.5125
median9.05
Q310.64
95-th percentile12.7675
Maximum17.77
Range15.8
Interquartile range (IQR)3.1275

Descriptive statistics

Standard deviation2.285119513
Coefficient of variation (CV)0.2524243798
Kurtosis0.05008123142
Mean9.052689422
Median Absolute Deviation (MAD)1.565
Skewness0.0233184743
Sum24134.47
Variance5.221771188
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
9.66130.5%
 
8.88110.4%
 
7.15100.4%
 
9.63100.4%
 
9.14100.4%
 
10.4990.3%
 
8.6490.3%
 
10.3590.3%
 
9.2390.3%
 
10.890.3%
 
Other values (875)256796.3%
 
ValueCountFrequency (%) 
1.971< 0.1%
 
2.031< 0.1%
 
2.131< 0.1%
 
2.2520.1%
 
2.41< 0.1%
 
ValueCountFrequency (%) 
17.771< 0.1%
 
17.191< 0.1%
 
16.991< 0.1%
 
16.421< 0.1%
 
16.391< 0.1%
 

Total intl minutes
Real number (ℝ≥0)

HIGH CORRELATION

Distinct158
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.23702176
Minimum0
Maximum20
Zeros15
Zeros (%)0.6%
Memory size20.8 KiB

Quantile statistics

Minimum0
5-th percentile5.8
Q18.5
median10.2
Q312.1
95-th percentile14.7
Maximum20
Range20
Interquartile range (IQR)3.6

Descriptive statistics

Standard deviation2.788348577
Coefficient of variation (CV)0.2723788855
Kurtosis0.6165548282
Mean10.23702176
Median Absolute Deviation (MAD)1.8
Skewness-0.2244342469
Sum27291.9
Variance7.774887787
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
10542.0%
 
10.2471.8%
 
9.8451.7%
 
11.5431.6%
 
9.1421.6%
 
11.3421.6%
 
10.6421.6%
 
9.7411.5%
 
9.5411.5%
 
10.9411.5%
 
Other values (148)222883.6%
 
ValueCountFrequency (%) 
0150.6%
 
1.11< 0.1%
 
1.31< 0.1%
 
2.11< 0.1%
 
2.21< 0.1%
 
ValueCountFrequency (%) 
201< 0.1%
 
18.91< 0.1%
 
18.41< 0.1%
 
18.220.1%
 
1820.1%
 

Total intl calls
Real number (ℝ≥0)

Distinct21
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.467366842
Minimum0
Maximum20
Zeros15
Zeros (%)0.6%
Memory size20.8 KiB

Quantile statistics

Minimum0
5-th percentile1
Q13
median4
Q36
95-th percentile9
Maximum20
Range20
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.456194903
Coefficient of variation (CV)0.5498081958
Kurtosis3.266618782
Mean4.467366842
Median Absolute Deviation (MAD)1
Skewness1.358768517
Sum11910
Variance6.032893402
MonotocityNot monotonic
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%) 
354420.4%
 
450318.9%
 
238814.6%
 
537614.1%
 
626710.0%
 
71726.5%
 
11254.7%
 
8903.4%
 
9833.1%
 
10371.4%
 
Other values (11)813.0%
 
ValueCountFrequency (%) 
0150.6%
 
11254.7%
 
238814.6%
 
354420.4%
 
450318.9%
 
ValueCountFrequency (%) 
201< 0.1%
 
191< 0.1%
 
1820.1%
 
171< 0.1%
 
1620.1%
 

Total intl charge
Real number (ℝ≥0)

HIGH CORRELATION

Distinct158
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.764489872
Minimum0
Maximum5.4
Zeros15
Zeros (%)0.6%
Memory size20.8 KiB

Quantile statistics

Minimum0
5-th percentile1.57
Q12.3
median2.75
Q33.27
95-th percentile3.97
Maximum5.4
Range5.4
Interquartile range (IQR)0.97

Descriptive statistics

Standard deviation0.7528120531
Coefficient of variation (CV)0.2723149976
Kurtosis0.6175371435
Mean2.764489872
Median Absolute Deviation (MAD)0.49
Skewness-0.2245685267
Sum7370.13
Variance0.5667259873
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
2.7542.0%
 
2.75471.8%
 
2.65451.7%
 
3.11431.6%
 
2.46421.6%
 
2.86421.6%
 
3.05421.6%
 
2.97411.5%
 
2.57411.5%
 
3.08411.5%
 
Other values (148)222883.6%
 
ValueCountFrequency (%) 
0150.6%
 
0.31< 0.1%
 
0.351< 0.1%
 
0.571< 0.1%
 
0.591< 0.1%
 
ValueCountFrequency (%) 
5.41< 0.1%
 
5.11< 0.1%
 
4.971< 0.1%
 
4.9120.1%
 
4.8620.1%
 

Customer service calls
Real number (ℝ≥0)

ZEROS

Distinct10
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.56264066
Minimum0
Maximum9
Zeros555
Zeros (%)20.8%
Memory size20.8 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q32
95-th percentile4
Maximum9
Range9
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.311235759
Coefficient of variation (CV)0.8391153465
Kurtosis1.813987028
Mean1.56264066
Median Absolute Deviation (MAD)1
Skewness1.095176262
Sum4166
Variance1.719339216
MonotocityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
194535.4%
 
260822.8%
 
055520.8%
 
334813.1%
 
41335.0%
 
5491.8%
 
6170.6%
 
780.3%
 
920.1%
 
81< 0.1%
 
ValueCountFrequency (%) 
055520.8%
 
194535.4%
 
260822.8%
 
334813.1%
 
41335.0%
 
ValueCountFrequency (%) 
920.1%
 
81< 0.1%
 
780.3%
 
6170.6%
 
5491.8%
 

Churn
Boolean

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
False
2278 
True
388 
ValueCountFrequency (%) 
False227885.4%
 
True38814.6%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

StateAccount lengthArea codeInternational planVoice mail planNumber vmail messagesTotal day minutesTotal day callsTotal day chargeTotal eve minutesTotal eve callsTotal eve chargeTotal night minutesTotal night callsTotal night chargeTotal intl minutesTotal intl callsTotal intl chargeCustomer service callsChurn
0KS128415NoYes25265.111045.07197.49916.78244.79111.0110.032.701False
1OH107415NoYes26161.612327.47195.510316.62254.410311.4513.733.701False
2NJ137415NoNo0243.411441.38121.211010.30162.61047.3212.253.290False
3OH84408YesNo0299.47150.9061.9885.26196.9898.866.671.782False
4OK75415YesNo0166.711328.34148.312212.61186.91218.4110.132.733False
5AL118510YesNo0223.49837.98220.610118.75203.91189.186.361.700False
6MA121510NoYes24218.28837.09348.510829.62212.61189.577.572.033False
7MO147415YesNo0157.07926.69103.1948.76211.8969.537.161.920False
8WV141415YesYes37258.68443.96222.011118.87326.49714.6911.253.020False
9RI74415NoNo0187.712731.91163.414813.89196.0948.829.152.460False

Last rows

StateAccount lengthArea codeInternational planVoice mail planNumber vmail messagesTotal day minutesTotal day callsTotal day chargeTotal eve minutesTotal eve callsTotal eve chargeTotal night minutesTotal night callsTotal night chargeTotal intl minutesTotal intl callsTotal intl chargeCustomer service callsChurn
2656GA122510YesNo0140.010123.80196.47716.69120.11335.409.742.624True
2657MD62408NoNo0321.110554.59265.512222.57180.5728.1211.523.114True
2658IN117415NoNo0118.412620.13249.39721.19227.05610.2213.633.675True
2659OH78408NoNo0193.49932.88116.9889.94243.310910.959.342.512False
2660OH96415NoNo0106.612818.12284.88724.21178.9928.0514.974.021False
2661SC79415NoNo0134.79822.90189.76816.12221.41289.9611.853.192False
2662AZ192415NoYes36156.27726.55215.512618.32279.18312.569.962.672False
2663WV68415NoNo0231.15739.29153.45513.04191.31238.619.642.593False
2664RI28510NoNo0180.810930.74288.85824.55191.9918.6414.163.812False
2665TN74415NoYes25234.411339.85265.98222.60241.47710.8613.743.700False